A Semantic Schema for Geonames
نویسندگان
چکیده
As part of a broader strategy towards supporting semantic interoperability in geospatial applications, in this paper we present a semantic schema we designed for GeoNames and the qualitative improvements we obtained by enforcing it on the data. Introduction. GeoNames (www.geonames.org) is a well-known geospatial dataset providing geographical data and metadata of around 7 million unique named places from all over the world collected from several sources. At top level, the places are categorized into 9 broader feature classes, further divided into 663 features which are arranged in a flat list with no relations between them. A special null class contains unclassified entities. Each class is associated one name and often a natural language description. Yet, a fixed terminology is an obstacle towards achieving semantic interoperability [6]. For example, if it is decided that the standard term to denote a terminal where subways load and unload passengers is metro station, it would fail in applications where the same concept is denoted with subway station. This weakness has been identified as one of the key issues for the future of the INSPIRE implementation [8, 9, 10, 11]. As part of the solution, geospatial ontologies by providing alternative terms and semantic relations between them represent a more flexible alternative [12, 13, 18]. They can be basically seen as semantic standards. Following this line, in our previous work [4, 5, 3] we came up with a methodology and a minimal set of guiding principles, based on the faceted approach, as originally used in library science [14], and developed a large-scale multilingual geospatial faceted ontology obtained from the refinement and extension of GeoNames, WordNet (wordnet.princeton.edu) and MultiWordNet (multiwordnet.fbk.eu). It accounts for the relevant classes, entities, their relations and attributes arranged into facets, each of them capturing a different aspect of the geospatial domain. For instance, it includes the facets land formation, body of water and populated place with corresponding more specific classes (exemplified in the picture aside). Following the faceted approach is known to guarantee the construction of very high quality ontologies in terms of robustness, extensibility, reusability, compactness and flexibility [15, 16]. This approach has been proven effective in geospatial applications. It is worth mentioning for instance the benefits obtained from the usage of such ontologies within the discovery service of the semantic geocatalogue of the Autonomous Province of Trento in Italy [1, 17]. This work also put the basis for the release of its geographical data and metadata as linked open government data [2]. Nevertheless, the usage of a geospatial ontology does not solve all the problems. In fact, GeoNames seems to lack of sufficient constraints on the domain and range of the attributes, and of corresponding mechanisms to enforce them which can guarantee for an adequate quality of the data. For instance, such constraints should prevent the attribute population to have a negative value and while it is fine for cities to have such attribute, this should be prevented for streams. This deficiency results in some unexpected mistakes. The solution we adopt is what we call a semantic schema. Landform Natural depression Oceanic depression Oceanic valley Oceanic trough Continental depression Trough Valley Natural elevation Oceanic elevation Seamount Submarine hill Continental elevation Hill Mountain Body of water Flowing body of water Stream Brook River Still body of water Pond Lake Populated place City Town Village The semantic schema. In this setting, we define a semantic schema as a set of constrains on the domain and range of the attributes (e.g. population) and the relations (e.g. capital) in the dataset. In particular, the schema is semantic-aware because the domain of attributes and relations, and the range of relations are always a class and its more specific classes taken from the geospatial ontology. For instance, if we specify that the domain of the attribute population is populated place (the main class), we assume it to apply also to city, town and village (more specific classes in the ontology). In the specific case of GeoNames, the range of attributes is instead a standard data type (e.g. integer, float or string). The purpose of the schema is expressly to define what is legal in terms of attributes, relations and corresponding values. Enforcing the schema corresponds to verifying the consistency of the dataset w.r.t. such constraints (see, e.g. [7]). Among others, the schema we defined includes the following constraints: Attribute Name Definition Domain (main class) Range Population the people who inhabit a territory or state Populated Place Long > 0 Altitude elevation above sea level Location but Undersea Float in [-423, 8848] Elevation vertical distance above a reference point Undersea Float Area the extent of a 2-dimensional surface enclosed within a boundary Location Float > 0 Capital A seat of government Geo-political entity Populated Place Notice in particular how we distinguish between elevation and altitude and separate the first from the second when clear from the domain. On the contrary, in GeoNames only elevation is provided. In fact, while elevation refers to a generic distance from a reference point, altitude is a more specific notion as in this case the reference point is the sea level. The range of altitude was set by referring to the altitude of the Dead Sea (the lowest) and Mount Everest (the highest) as taken from Wikipedia. Enforcing the schema brought to some surprising results. For instance: • Despite in GeoNames it is assumed that elevation has not to be provided for oceanic entities, we have found that 2,934 entities (e.g., Mentawai Ridge) of 33 different undersea classes (e.g., oceanic ridge, oceanic valley) have actually a value for it. We keep these values in the ontology by separating them from altitude. • In GeoNames the Dead Sea is represented with negative altitude set to −405 m. Surprisingly, GeoNames contains other 45 locations with same altitude of the Dead Sea, and two other locations are reported to be even lower than the Dead Sea (Nahal Amazyahu and `Arvat Sedom). Manual checks were needed to verify their correctness. • The domain of population includes several unexpected classes such as airport, stream and garden. We removed population from corresponding entities in the ontology. • We found several entities with elevation set to -9999 that is used in GeoNames to encode an unknown value. We removed elevation from corresponding entities in the ontology. • In the range of capital, 3 entities are registered as cities (e.g. Jerusalem) while all the others as capitals. This is not wrong, but at least this is not homogeneous. Actually, as no location is essentially a capital (the capital of a country may change in time; see also [19] about the distinction between rigid and not rigid properties), we set corresponding class to populated place for all of them. • The area of United States Minor Outlying Islands is set to 0. We corrected it to 34200 m as reported in Wikipedia. Conclusions. In this paper we have stressed the need for an integrated approach to effectively support semantic interoperability between different geospatial applications. The proposed solution consists in the usage of a geospatial faceted ontology providing the terminology of the geospatial domain (which can be seen as a sort of more flexible semantic standard) and a semantic schema that, by establishing precise constraints on the domain and range of the attributes and the relations, guarantees a higher level of data quality. Acknowledgments. The research leading to these results has received funding from the CUbRIK Collaborative Project, partially funded by the European Commission's 7th Framework ICT Programme for Research and Technological Development under the Grant agreement no. 287704.
منابع مشابه
The XML and Semantic Web Worlds: Technologies, Interoperability and Integration: A Survey of the State of the Art
In the context of the emergent Web of Data, a large number of organizations, institutes and companies (e.g., DBpedia, Geonames, PubMed ACM, IEEE, NASA, BBC) adopt the Linked Data practices and publish their data utilizing Semantic Web (SW) technologies. On the other hand, the dominant standard for information exchange in the Web today is XML. Many international standards (e.g., Dublin Core, MPE...
متن کاملAn Improved Semantic Schema Matching Approach
Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...
متن کاملEvaluation of “Mosaic 1 Reading”: A Microstructural Approach to Textual Analysis of Pedagogical Materials
To analyze and evaluate textbooks, researchers have either proposed scales and checklists to be filled by teachers and learners or conducted qualitative investigations of the match between SLA theories and textbook activities. This study, however, employs the microstructural approach of schema theory to scrutinize the reading passages of “Mosaic 1 Reading”. To this end, 17 passages of the textb...
متن کاملMeaningfulness of Religious Language in the Light of Conceptual Metaphorical Use of Image Schema: A Cognitive Semantic Approach
According to modern religious studies, religions are rooted in certain metaphorical representations, so they are metaphorical in nature. This article aims to show, first, how conceptual metaphors employ image schemas to make our language meaningful, and then to assert that image-schematic structure of religious expressions, by which religious metaphors conceptualize abstract meanings, is the ba...
متن کاملAdding Semantic Annotations into (Geospatial) RESTful Services
In this paper the authors present an approach for the semantic annotation of RESTful services in the geospatial domain. Their approach automates some stages of the annotation process, by using a combination ofresources and services: a cross-domain knowledge base like DBpedia, two domain ontologies like GeoNames and the WGS84 vocabulary, and suggestion and synonym services. The authors' approach...
متن کامل